knitr::opts_chunk$set(eval = FALSE)

title : “Child language experience in a Tseltal Mayan village” shorttitle : “Child language experience in a Tseltal Mayan village”

author: - name : “Marisa Casillas” affiliation : “1” corresponding : yes # Define only one corresponding author address : “P.O. Box 310, 6500 AH Nijmegen, The Netherlands” email : “Marisa.Casillas@mpi.nl” - name : “Penelope Brown” affiliation : “1” - name : “Stephen C. Levinson” affiliation : “1”

affiliation: - id : “1” institution : “Max Planck Institute for Psycholinguistics” # - id : “2” # institution : “Cambridge University”

author_note: |

Add complete departmental affiliations for each author here. Each new line herein must be indented, like this line.

abstract: | Enter abstract here. Each new line herein must be indented, like this line.

keywords : “Child-directed speech, Linguistic input, Non-WEIRD, Vocal maturity, Turn taking” wordcount : “X”

bibliography : [“Tseltal-CLE.bib”]

figsintext : yes figurelist : no tablelist : no footnotelist : no lineno : yes mask : no

class : “man” output : papaja::apa6_pdf —

Introduction

A great deal of work in developmental language science revolves around one central question: What linguistic evidence (i.e., what types and how much) is needed to support first language acquisition? In pursuing this topic, many researchers have fixed their sights on the quantity and characteristics of speech addressed to children; that is, speech designed for young recipients who may have limited attention and understanding. In several languages, child-directed speech (CDS1) is linguistically accommodated to young listeners [@soderstrom2007beyond; @cristia2013input], interactionally rich [@bruner1985childs; @masataka2003onset; @butterworth2003pointing; @estigarribia2007getting], and preferred by infants [@cooper1990preference; @segal2015infant; @manybabies2017; @hoff2006social; @golinkoff2015baby]. In those same linguistic communities, these properties of CDS have been found to facilitate early word learning [e.g., @hoff2003specificity; @hurtado2008does; @rowe2008child; @shneidman2012language; @shneidman2012counts; @cartmill2013quality; @weisleder2013talking; @hirshpasek2015contribution]. Yet ethnographic reports from a number of traditional, non-Western communities suggest that children easily acquire their community’s language(s) even when the children are infrequently directly addressed [@brown2011cultural]. If so, large quantities of CDS may not be essential for learning language; just useful for facilitating certain aspects of language development. In this paper we investigate the language environment and early development of 10 Tseltal Mayan children growing up in a community where caregivers are reported to infrequently directly address speech to infants and young children [@brown1998conversational; @brown2011cultural; @brown2014interactional].

Child-directed speech

Prior work on CDS in Western contexts has shown that the amount of CDS children hear influences their language development; more CDS is associated with larger and faster-growing receptive and productive vocabularies in young children [e.g., @hart1995meaningful; @hoff2003specificity; @hurtado2008does; @shneidman2012language; @shneidman2012counts; @weisleder2013talking; @ramirezesparza2014look; @ramirezesparza2017look; @peterIPindividual]. CDS has also been linked to young children’s speed of lexical retrieval [@hurtado2008does, @weisleder2013talking, but see @peterIPindividual] and syntactic development [@huttenlocher2010sources]. The conclusion drawn from much of this work is that speech directed to children is well designed for learning words—especially concrete nouns and verbs—because it is optimized for a child’s attention in the moment it is spoken. Even outside of first-person interaction, infants and young children prefer listening to attention-grabbing CDS over adult-directed speech [see @manybabies2017 for a review]. There are, however, a few significant caveats to the body of work relating CDS quantity to language development.

First, while there is overwhelming evidence linking CDS quantity to vocabulary size, links to grammatical development are more scant [e.g., @huttenlocher2010sources; @frankIPvariability; @brinchmann2019direct]. While the advantage of CDS for referential word learning is clear, it is less obvious how CDS facilitates syntactic learning. For example, utterance length [a proxy for syntactic complexity; @wasow1997remarks] doesn’t appear to increase with child age [@newport1977mother], and parents are less likely to directly correct their children’s syntactic errors than their semantic ones [@brown1997introduction; but see @chouinard2003adult]—even sometimes themselves producing ungrammatical utterances to make individual words salient to their young interlocutors [@aslin1996models; see also @yurovsky2018communicative]. On the other hand, there is a wealth of evidence that syntactic knowledge is lexically specified [see, e.g., @lieven1997lexically; @goldberg2003constructions; @arnold2004avoiding], and that, crosslinguistically, children’s vocabulary size is one of the most robust predictors of their early syntactic development [@bates1997inseparability; @marchman2004language; @frankIPvariability]. In short, what is good for the lexicon may also be good for syntax. For now, however, the link between CDS and other aspects of grammatical development still needs to be more thoroughly tested.

A second caveat is that most work on CDS quantity uses summary measures that average over the ebb and flow of interaction (e.g., proportion CDS). In both child and adult interactions, verbal behaviors are highly structured: while some occur at fairly regular intervals (‘periodic’, e.g., discourse connectives such as ‘alright’, ‘okay’, and ‘well’), others occur in shorter, more intense bouts separated by long periods of inactivity (‘bursty’, e.g., content words, descriptions; @abney2018bursts, see also @fusaroli2014synergy). For example, Abney and colleagues [-@abney2017time] found that, across multiple time scales of daylong recordings, both infants’ and adults’ vocal behavior was clustered. Focusing on lexical development, Blasi and colleagues [-@blasiIPhuman] also found that nouns and verbs were used burstily in child-proximal speech across all six of the languages in their typologically diverse sample. Infrequent words were somewhat more bursty overall, leading them to propose that burstiness may play a key and universal role in acquiring otherwise-rare linguistic units. Experiment-based work also shows that two-year-olds learn novel words better from a massed presentation of object labels versus a distributed presentation ([@schwab2016repetition] but see [@ambridge2006distributed] and [@childers2002two]). Structured temporal characteristics in children’s language experience imply new roles for attention and memory in language development. By that token, we should begin to investigate the link between CDS and linguistic development with more nuanced measures of how CDS is distributed.

Finally, prior work has typically focused on Western (primarily North American) populations, limiting our ability to generalize these effects to children acquiring language worldwide [@henrich2010beyond; @nielsen2017persistent; @lieven1994crosslinguistic; @brown2014language]. While we do gain valuable insight by looking at within-population variation (e.g., different socioeconomic or sub-cultures), we can more effectively find places where our assumptions break down by studying new populations. Linguistic anthropologists working in non-Western communities have long reported that caregiver interaction styles vary immensely from place to place, with some caregivers using little or no CDS to young children [@lieven1994crosslinguistic; @gaskins2006cultural; @brown2014language]. Children in these communities reportedly acquire language with ‘typical’-looking benchmarks. For example, they start pointing and talking around the same time we would expect for Western middle-class infants [e.g., @brown2011cultural; @liszkowski2012prelinguistic; @brown2014language; @brown2014interactional; but see also @salomo2013sociocultural]. These findings have had little impact on mainstream theories of word learning and language acquisition, partly due to a lack of directly comparable measures [@brown2014language; @brown2014interactional]. If, however, these children indeed acquire language without delay despite little or no CDS, we must reconsider what kind of linguistic evidence is necessary for children to learn language.

Language development in non-WEIRD communities

A growing number of researchers are using methods from developmental psycholinguistics to describe the language environments and linguistic development of children growing up in traditional and/or non-Western communities [see also, e.g., @demuth2010three; @barrett2013early; @hernik2018infant; @ganek2018using; @garcia2018thematic; @fortierURadhoc]. We briefly highlight two recent efforts along these lines, but see Cristia and colleagues [-@cristia2017child] and Mastin and Vogt’s work [-@vogt2015communicative; -@mastin2016infant] for similar examples.

Scaff, Cristia, and colleagues [-@cristia2017child; -@scaffIPlanguage] have used a number of methods to estimate how much speech children hear in a Tsimane forager-horticulturalist population in the Bolivian lowlands. From daylong audio recordings, they estimate that Tsimane children between 0;6 and 6;0 hear maximally ~5 minutes of directly addressed speech per hour, regardless of their age (but see Cristia et al., 2017). For comparison, children from North American homes between ages 0;3 and 3;0 are estimated to hear ~11 minutes of CDS per hour in daylong recordings [@bergelsoncasillas2018what]. Tsimane children also hear ~10 minutes of other-directed speech per hour (e.g., talk between adults) compared to the ~7 minutes per hour heard by young North American children [@bergelsoncasillas2018what]. This difference may be attributable to the fact that the Tsimane live in extended family clusters of 3–4 households, so speakers are typically in close proximity to 5–8 other people [@cristia2017child].

Shneidman and colleagues [-@shneidman2010language; -@shneidman2012language] analyzed speech from one-hour at-home video recordings of children between ages 1;0 and 3;0 in two communities: Yucatec Mayan (Southern Mexico) and North American (a major U.S. city). Their analyses yielded four main findings: compared to the American children, (a) the Yucatec children heard many fewer utterances per hour, (b) a much smaller proportion of the utterances they heard were child-directed, (c) the proportion of utterances that were child-directed increased dramatically with age, matching U.S. children’s by 3;0 months, and (d) most of the added CDS came from other children (e.g., older siblings and cousins). They also demonstrated that the lexical diversity of the CDS they hear at 24 months—particularly from adult speakers—predicted children’s vocabulary knowledge at 35 months.

These groundbreaking studies establish a number of important findings: First, children in each of these communities appear able to acquire their languages with relatively little CDS. Second, CDS might become more frequent as children get older, though this could largely be due to speech from other children. Finally, despite these differences, CDS from adults may still be the most robust predictor of vocabulary growth.

The current study

We examine the early language experience of 10 Tseltal Mayan children under age 3;0. Prior ethnographic work suggests that Tseltal caregivers do not frequently speak directly to their children until the children themselves begin to actively initiate verbal interactions [@brown2011cultural; @brown2014interactional]. Nonetheless, Tseltal children develop language with no apparent delays. Tseltal Mayan language and culture has much in common with the Yucatec Mayan communities Shneidman reports on, allowing us to compare differences in child language environments between the two sites more directly than before.2 We provide more details on this community and dataset in the Methods section.

Similar to previous work, we estimated how much other-directed speech children could have listened to, how much was directed to them, and how those quantities changed with age. To this foundation we added new sampling techniques for investigating variability in children’s speech environments within daylong recordings. We also analyzed children’s early vocal productions, examining both the overall developmental trajectory of their vocal maturity and how their vocalizations are influenced by CDS.

Based on prior work, we predicted that Tseltal Mayan children hear little CDS, that the amount of CDS they hear increases with age, that most CDS comes from other children, and that, despite this, Tseltal Mayan children reach speech production benchmarks on par with Western children. We additionally predicted that children’s language environments would be bursty—that brief, high-intensity interactions would be sparsely distributed throughout the day, accounting for the majority of children’s daily CDS. [TASK: REVISIT THIS]

Methods

Community

The children in our dataset come from a small-scale, subsistence farming community in the highlands of Chiapas in Southern Mexico. The vast majority of children grow up speaking Tseltal monolingually at home. The first few years of primary school are conducted mainly in Tseltal, but the remainder of primary school, secondary school, and any further education is conducted exclusively in Spanish. Nuclear families are often large (5+ children) and live in patrilineal clusters. Nearly all families grow staple crops such as corn and beans, but also bananas, chilies, squash, coffee, and more. Household and farming work is divided among men, women, and older children. Women do much of the daily cleaning and food preparation, but also frequently work in the garden, haul water and firewood, and do other physical labor. A few community members—both men and women—earn incomes as teachers and shopkeepers but are still expected to regularly contribute to their family’s household work.

More than forty years of ethnographic work by the second author has reported that Tseltal children’s language environments are non-child-centered and non-object-centered [@brown1998conversational; @brown2011cultural; @brown2014interactional]. During their waking hours, Tseltal infants are typically tied to their mother’s back while she goes about her work for the day. Infants receive very little direct speech until they themselves begin to initiate interactions, usually as they approach their first birthdays. Even then, interactional exchanges are often brief or non-verbal (e.g., object exchange routines) and take place within a multi-participant context [@brown2014interactional]. Rarely is attention given to words and their meanings, even when objects are central to the activity. Instead, interactions tend to focus on appropriate actions and responses, and young children are socialized to attend to the interactions taking place around them [see also @rogoff2003firsthand; @deleon2011language].

Young children are often cared for by other family members, especially older siblings. Even when not on their mother’s back, infants are rarely put on the ground, so they can’t usually pick up the objects around them until they are old enough to walk. Toys are scarce and books are vanishingly rare, so the objects children do get their hands on tend to be natural or household objects (e.g., rocks, sticks, spoons, baskets, etc.). By age five, most children are competent speakers who engage daily in chores and caregiving of their younger siblings. The Tseltal approach to caregiving is similar to that described for other Mayan communities [e.g., @pye1986quiche; @rogoff1993guided; @gaskins1996how; @deleon1998emergent; @gaskins1999childrens; @rogoff2003firsthand; @deleon2011language; @shneidman2012language].

Corpus

The current data come from the Casillas HomeBank Corpus [@Casillas-HB; @HomeBank], which includes daylong recordings and other developmental language data from more than 100 children under 4;0 across two indigenous, non-WEIRD communities: the Tseltal Mayan community described here and a Papua New Guinean community described elsewhere [@brown2011cultural; @brown2014interactional; @brownIPchildrearing].

[TASK: Check these demographic data again] The Tseltal data, primarily collected in 2015, include recordings from 55 children born to 43 mothers. The families in our dataset typically only had 2–3 children (median = 2; range = 1–9), due to the fact that the participating families come from a young subsample of the community (mothers: mean = 26.9 years; median = 25.9; range = 16.6–43.8 and fathers: mean = 30.5; median = 27.6; range = 17.7—52.9). On average, mothers were 20.1 years old when they had their first child (median = 19; range = 12–27), with a following inter-child interval of 3.04 years (median = 2.8; range = 1–8.5).. As a result, 26% of the participating families had two children under 4;0. To our knowledge at time of recording, all children were typically developing. We calculated the precise age of children based on the birthdates given by their caregivers, though these should be taken with a pinch of salt because documentation and reporting of birthdates is less rigorous than is typically expected for studies based on Western post-industrial populations.

Households size, defined in our dataset by the number of people sharing a kitchen or other primary living space, ranged between between 3 and 15 people (mean = NN; median = NN). Although 30.9% of the target children are first-born, they were rarely the only child in their household. Caregiver education is one (imperfect) measure of contact with Western culture. Most mothers had finished primary school, with many also having completed secondary school (range = no schooling–university). Most fathers had finished secondary school, with many having also completed preparatory school (range = no schooling–university). Clan membership influences marriage and land inheritance such that 93% of the fathers grew up in the village where the recordings took place, while only 53% of the mothers did.

Recordings

Methods for estimating the quantity of speech that children hear have advanced significantly in the past two decades, with long-format at-home audio recordings quickly becoming the new standard [e.g., with the LENA^®^ system; @greenwood2011assessing]. These recordings capture a wider range of the linguistic patterns children hear as they participate in different activities with different speakers over the course of their day. In long-format recordings, caregivers also tend to use less CDS [@tamislemonda2017power; see also @bergelson2018day]3. The goal of these recordings is to more or less capture a representative sample of what the child hears and says at home.

The recording vest fit over children's chests with an audio recording device in the front horizontal pocket and a camera fitted with a fisheye lens attached to the a shoulder strap.

The recording vest fit over children’s chests with an audio recording device in the front horizontal pocket and a camera fitted with a fisheye lens attached to the a shoulder strap.

We used a novel combination of a lightweight stereo audio recorder (Olympus® WS-832) and wearable photo camera (Narrative Clip 1®) fitted with a fish-eye lens, to track children’s movements and interactions over the course of a 9–11-hour period in which the experimenter was not present. Each recording was made during a single day at home in which the recorder and/or camera was attached to the child. Ambulatory children wore both devices on an elastic vest. Non-ambulatory children wore the recorder in a onesie while their primary caregiver wore the camera on an elastic vest (see Figure 1). The camera was set to take photos at 30-second intervals and was synchronized to the audio in post-processing to create a video file featuring the snapshot-linked audio from the child’s recording.4

Data selection and annotation

We annotated video clips from 10 of the 55 children’s recordings. We chose these 10 recordings to maximize variance in three demographic variables: child age (0–3;0), child sex, and maternal education. The sample is summarized in Table 1 [TASK: Make table]. We then selected one hour’s worth of non-overlapping clips from each recording in the following order: nine randomly selected 5-minute clips, five 1-minute clips manually selected as the top ‘turn-taking’ minutes of the recording, five 1-minute clips manually selected as the top ‘vocal activity’ minutes of the recording, and one, manually selected 5-minute extension of the best 1-minute sample (see Figure 2). We created these different subsamples of each day to measure properties of (a) children’s average language environments (random samples) and (b) their most input-dense language environments (turn-taking samples). The third sample (high-activity) gave us insight into children’s productive speech abilities.

The turn-taking and high-activity clips were chosen by two trained annotators (the first author and a student assistant) who listened to each recording in its entirety at 1–2x speed while actively taking notes about potentially useful clips. Afterwards, the first author reviewed the list of candidate clips, listened again to each one (at 1x speed, multiple repetitions), and chose the best five 1-minute samples for each of the two types of activity. Good turn-taking activity was defined as closely timed sequences of contingent vocalization between the target child and at least one other person (i.e., frequent vocalization exchanges). The ‘best’ turn-taking clips were chosen because they had the most and most clear turn-switching activity between the target child and the other speaker(s). Good vocal activity clips were defined as clips in which the target child produced the most and most diverse spontaneous (i.e., not imitative) vocalizations. The ‘best’ vocal activity clips were chosen for representing the most linguistically mature and/or diverse vocalizations made by the child over the day. All else being equal, candidate clips were prioritized when they contained less background noise or featured speakers and speech that were not otherwise frequently represented (e.g., CDS from older males). The best turn-taking clips and vocal activity clips often overlapped; turn-taking clips were selected from the list of candidates first, and then vocal-activity clips were chosen from the remainder. The instructions for selecting clips and resulting notes can be found at https://github.com/marisacasillas/Tseltal-CLE/blob/master/audio_scanning_instructions.md.

Recording duration (black line) and sampled clips (colored boxes) for each recording analyzed, sorted by child age.

Recording duration (black line) and sampled clips (colored boxes) for each recording analyzed, sorted by child age.

Each video clip was transcribed and annotated in ELAN [@ELAN] using the ACLEW Annotation Scheme [@casillas2017ACLEWDAS] by the first author and a native speaker of Tseltal who lives in the community and knows most of the recorded families personally. The annotations include the transcription of (nearly) all hearable utterances in Tseltal, a loose translation of each utterance into Spanish, vocal maturity measures of each target child utterance (non-linguistic vocalizations/non-canonical babbling/non-word canonical babbling/single words/multiple words), and addressee annotations for all non-target-child utterances (target-child-directed/other-child-directed/adult-directed/adult-and-child-directed/animal-directed/other-speaker-type-directed). We annotated each utterance for intended addressee using contextual interactional information from the photos, audio, and preceding/following footage; we used an ‘unsure’ category for utterances without sufficient evidence for confident classification.5 We exported each ELAN file as tab-separated values for analysis.

Data analysis

In what follows, we first describe quantitative characteristics of children’s speech environments, as captured by the 9 randomly selected five-minute clips for each child. We report five measures: target-child-directed speech (TCDS) and other-directed speech (ODS) minutes per hour, the number of target-child-to-other (TC–O) and other-to-target-child (O-TC) turn transitions per minute, and the duration of the target child’s interactional sequences in seconds. We then briefly review these same speech environment characteristics for the 5–6 one- or five-minute turn-taking clips6, as representative ‘peak’ interactional moments in the day and investigate how many minutes in the day are likely to have these characteristics.

Results

[TASK: change fits in the figures to reflect model estimates]

Data analysis

Unless otherwise stated, all analyses were conducted with generalized linear mixed-effects regressions using the glmmTMB package and all plots are generate with ggplot2 in R [@R-glmmTMB; @R-ggplot2; @R-base].7 Notably, all five speech environment measures are restricted to non-negative values (min/hr, turn transitions/min, and duration in seconds), with a subset of them also displaying extra cases of zero in the randomly sampled clips (min/hr, turn transitions/min; e.g., when the child is napping). The consequence of these boundary restrictions is that the variance of the distributions becomes non-gaussian (i.e., a long right tail). We account for this issue by using negative binomial regression, whish is useful for overdispersed count data [@brooks2017modeling; @smithson2013generalized]. When extra cases of zero are present due to, e.g., no speakers being present, we used a zero-inflation negative binomial regression, which creates two models: (a) a binary model to evaluate the likelihood of none vs. some presence of the variable (e.g., TCDS) and (b) a count model of the variable (e.g., ‘3’ vs. ‘5’ TCDS min/hr), using the negative binomial distribution as the linking function. Alternative analyses using gaussian models with logged dependent variables are available in the Supplementary Materials, but are qualitatively similar to the results we report here.

Our primary predictors were as follows: child age (months), household size (number of people), and number of non-target-child speakers present in that clip, all centered and standardized, plus squared time of day at the start of the clip (in decimal hours; centered on noon and standardized). We always used squared time of day to model the cycle of activity at home: the mornings and evenings should be more similar to each other than midday because people tend to disperse for chores after breakfast. To this we also added two-way interactions between child age and number of speakers present, household size, and time of day. Finally, we included a random effect of child, with random slopes of time of day, unless doing so resulted in model non-convergence. Finally, for the zero-inflation models, we included child age, number of speakers present, and time of day. We have noted below when models needed to deviate from this core design to achieve convergence. We only report significant effects here; full model outputs are available in the Supplementary Materials.

quantity.nonrand.tt.minimum <- dplyr::select(quantity.nonrand.tt,
                                             age_mo_round, xds_mph, ods_mph, tds_mph,
                                             prop_tds, n_spkrs_clip) %>%
                                             mutate(Sample = "Turn taking")
quantity.rand.minimum <- dplyr::select(quantity.rand,
                                       age_mo_round, xds_mph, ods_mph, tds_mph,
                                       prop_tds, n_spkrs_clip) %>%
                                       mutate(Sample = "Random")
quantity.rand_and_tt <- bind_rows(quantity.nonrand.tt.minimum, quantity.rand.minimum)

quantity.nonrand.sa.tt.minimum <- dplyr::select(quantity.nonrand.tt.sa,
                                             age_mo_round, prop_sa.tds, SpkrAge) %>%
                                             mutate(Sample = "Turn taking")
quantity.rand.sa.minimum <- dplyr::select(quantity.rand.sa,
                                       age_mo_round, prop_sa.tds, SpkrAge) %>%
                                       mutate(Sample = "Random")
quantity.sa.rand_and_tt <- bind_rows(quantity.nonrand.sa.tt.minimum, quantity.rand.sa.minimum)

# ODS min/hr
odsmph.segments.rand_and_tt <- ggplot(quantity.rand_and_tt,
                          aes(x = age_mo_round, y = ods_mph, lty = Sample)) +
  geom_boxplot(aes(group = interaction(age_mo_round, Sample),
                   color = Sample), fill = "white", outlier.shape = NA,
               lty = "solid", alpha = 0.4) +
  geom_smooth(aes(fill = Sample, color = Sample), method = "lm") +
  ylab("ODS (min/hr)") + xlab("Child age (mo)") +
  scale_y_continuous(limits=c(-10,80),
                     breaks=seq(0,80,20)) +
  scale_x_continuous(limits=c(0,38),
                     breaks=seq(0,38,6)) +
  coord_cartesian(ylim=c(0,80),xlim=c(0,38)) +
  scale_color_manual(values = viridis(3)) +
  scale_fill_manual(values = viridis(3)) +
  theme_apa() +
  theme(legend.position="none",
        axis.line = element_line(color="black", size = 0.4))

# TDS min/hr
tdsmph.segments.rand_and_tt <- ggplot(quantity.rand_and_tt,
                          aes(x = age_mo_round, y = tds_mph, lty = Sample)) +
  geom_boxplot(aes(group = interaction(age_mo_round, Sample),
                   color = Sample), fill = "white", outlier.shape = NA,
               lty = "solid", alpha = 0.4) +
  geom_smooth(aes(fill = Sample, color = Sample), method = "lm") +
  ylab("TCDS (min/hr)") + xlab("Child age (mo)")    +
  scale_y_continuous(limits=c(0,80),
                     breaks=seq(0,80,20)) +
  scale_x_continuous(limits=c(0,38),
                     breaks=seq(0,38,6)) +
  coord_cartesian(ylim=c(0,80),xlim=c(0,38)) +
  scale_color_manual(values = viridis(3)) +
  scale_fill_manual(values = viridis(3)) +
  theme_apa() +
  theme(legend.position="none",
        axis.line = element_line(color="black", size = 0.4))

# TDS min/hr - zoomed in
tdsmph.segments.rand_and_tt.zoomedin <- ggplot(quantity.rand_and_tt,
                          aes(x = age_mo_round, y = tds_mph, lty = Sample)) +
  geom_boxplot(aes(group = interaction(age_mo_round, Sample),
                   color = Sample), fill = "white", outlier.shape = NA,
               lty = "solid", alpha = 0.4) +
  geom_smooth(aes(fill = Sample, color = Sample), method = "lm") +
  ylab("TCDS (min/hr)") + xlab("Child age (mo)")    +
  scale_y_continuous(limits=c(0,40),
                     breaks=seq(0,40,10)) +
  scale_x_continuous(limits=c(0,38),
                     breaks=seq(0,38,6)) +
  coord_cartesian(ylim=c(0,40),xlim=c(0,38)) +
  scale_color_manual(values = viridis(3)) +
  scale_fill_manual(values = viridis(3)) +
  theme_apa() +
  theme(legend.position="none",
        axis.line = element_line(color="black", size = 0.4))

# TDS prop
tdsprp.segments.rand_and_tt <- ggplot(quantity.rand_and_tt,
                          aes(x = age_mo_round, y = prop_tds, lty = Sample)) +
  geom_boxplot(aes(group = interaction(age_mo_round, Sample),
                   color = Sample), fill = "white", outlier.shape = NA,
               lty = "solid", alpha = 0.4) +
  geom_smooth(aes(fill = Sample, color = Sample), method = "lm") +
  ylab("TCDS/All spch") + xlab("Child age (mo)")    +
  scale_y_continuous(limits=c(-.2,1.2),
                     breaks=seq(0,1,0.2)) +
  scale_x_continuous(limits=c(0,38),
                     breaks=seq(0,38,6)) +
  coord_cartesian(ylim=c(0,1),xlim=c(0,38)) +
  scale_color_manual(values = viridis(3)) +
  scale_fill_manual(values = viridis(3)) +
  theme_apa() +
  theme(legend.position="none",
        axis.line = element_line(color="black", size = 0.4))

# prop TDS from children
tdsprp.segments.rand_and_tt.sa <- ggplot(subset(quantity.sa.rand_and_tt, SpkrAge == "Child"),
                             aes(x = age_mo_round, y = prop_sa.tds, lty = Sample)) +
  geom_boxplot(aes(group = interaction(age_mo_round, Sample),
                   color = Sample), fill = "white", outlier.shape = NA,
               lty = "solid", alpha = 0.4) +
  geom_smooth(aes(fill = Sample, color = Sample), method = "lm") +
  ylab("Prop of TCDS") + xlab("Child age (mo)") +
  scale_y_continuous(limits=c(-.2,1.2),
                     breaks=seq(0,1,0.2)) +
  scale_x_continuous(limits=c(0,38),
                     breaks=seq(0,38,6)) +
  coord_cartesian(ylim=c(0,1), xlim=c(0,38)) +
  scale_color_manual(values = viridis(3)) +
  scale_fill_manual(values = viridis(3)) +
  theme_apa() +
  theme(legend.position="none",
        axis.line = element_line(color="black", size = 0.4))
By-child estimates of minutes per hour of other-directed speech (left) and target-child-directed speech (right). Data are shown for the random (purple; solid) and turn taking (green; dashed) samples. Bands on the solid linear trends show 95% CIs.

By-child estimates of minutes per hour of other-directed speech (left) and target-child-directed speech (right). Data are shown for the random (purple; solid) and turn taking (green; dashed) samples. Bands on the solid linear trends show 95% CIs.

Target-child-directed speech (TCDS)

The Tseltal children in our study were directly spoken to for an average of 3.63 minutes per hour in the random sample (median = 4.08; range = 0.83–6.55; Figure 3). These estimates are close to those reported for Yucatec Mayan data [@shneidman2012language], which are plotted with our data, along with estimates from a few other populations in Figure 4 [US/Canada: @bergelsoncasillas2018what; US urban and Yukatek: @shneidman2010language; Mozambique urban and rural, and Dutch: @vogt2015communicative; Tsimane: @scaffIPlanguage; see @scaffIPlanguage for a more detailed comparison].8. We modeled TCDS min/hr in the random clips with a zero-inflated negative binomial regression, as described above.

TCDS rate reported from daylong recordings made in different populations, including both urban (gray) and rural/indigenous (black) samples. Each point is the average TCDS rate reported for children at the indicated age, and size indicates number of children sampled (range: 1--26). See text for references to original studies.

TCDS rate reported from daylong recordings made in different populations, including both urban (gray) and rural/indigenous (black) samples. Each point is the average TCDS rate reported for children at the indicated age, and size indicates number of children sampled (range: 1–26). See text for references to original studies.

## TCDS random sample ####
#ggplot(quantity.rand, aes(round(tds_mph,0))) + geom_histogram()
#sd(round(quantity.rand$tds_mph,0))^2
#mean(round(quantity.rand$tds_mph,0))
# mean is much smaller than variance
tds.rand.zinb <- glmmTMB(round(tds_mph,0) ~
                           tchiyr.std +
                           I(stthr.std^2) +
                           hsz.std +
                           nsk.std +
                           tchiyr.std:I(stthr.std^2) +
                           tchiyr.std:hsz.std +
                           tchiyr.std:nsk.std +
                           (1+I(stthr.std^2)|aclew_child_id),
                         data=quantity.rand,
                         ziformula=~tchiyr.std,#nsk.std,I(stthr.std^2)
                         family="nbinom1")
#res = simulateResiduals(tds.rand.zinb)
#plot(res, rank = T)
#summary(tds.rand.zinb)
#I(stthr.std^2)              4.3559     1.9268   2.261  0.02378 *  
#nsk.std                     0.2439     0.1346   1.812  0.06999 .  
#tchiyr.std:I(stthr.std^2)  -5.2257     1.9824  -2.636  0.00839 ** 
# ZI effects (only marginal)
#tchiyr.std    -7.767      4.160  -1.867   0.0619 .
# save for reporting
tds.rand.zinb.COEF.time <-
  coef(summary(tds.rand.zinb))[[1]]["I(stthr.std^2)",] 
tds.rand.zinb.COEF.agetime <-
  coef(summary(tds.rand.zinb))[[1]]["tchiyr.std:I(stthr.std^2)",]
tds.rand.zinb.COEF.ZI.age <-
  coef(summary(tds.rand.zinb))[[2]]["tchiyr.std",] 

## TCDS tt sample ####
#ggplot(quantity.nonrand.tt, aes(round(tds_mph,0))) + geom_histogram()
#sd(round(quantity.nonrand.tt$tds_mph,0))^2
#mean(round(quantity.nonrand.tt$tds_mph,0))
# mean is much smaller than variance
# not zero-inflated (nature of this sample)
tds.tt.nb <- glmmTMB(round(tds_mph,0) ~
                       tchiyr.std +
                       I(stthr.std^2) +
                       hsz.std +
                       nsk.std +
                       tchiyr.std:I(stthr.std^2) +
                       tchiyr.std:hsz.std +
                       tchiyr.std:nsk.std +
                       (1|aclew_child_id), #I(stthr.std^2)
                     data=quantity.nonrand.tt,
                     family="nbinom1")
#res = simulateResiduals(tds.tt.nb)
#plot(res, rank = T)
#summary(tds.tt.nb)
#tchiyr.std:I(stthr.std^2) -2.75472    1.61153  -1.709   0.0874 .  
#tchiyr.std:hsz.std        -0.37256    0.21505  -1.732   0.0832 .

The rate of TCDS in the randomly sampled clips was primarily affected by factors relating to the time of day. The count model showed that, overall, children were more likely to hear TCDS in the mornings and evenings than around midday (B = 4.32, SD = 1.92, z = 2.25, p = 0.02). However, this pattern weakened for older children, some of whom even heard peak TCDS input around midday, as illustrated in Figure 5 (B = -5.22, SD = 1.97, z = -2.64, p = 0.01). There were no significant effects of child age, household size, or number of speakers present, no significant effects in the zero-inflation model.9

TCDS rate heard at different times of day by children 12 months and younger (left) and 13 months and older (right) in the randomly selected (purple) and turn-taking (green) clips.

TCDS rate heard at different times of day by children 12 months and younger (left) and 13 months and older (right) in the randomly selected (purple) and turn-taking (green) clips.

# Aggregate to one point per child and then test a correlation with age
# random
propchitcds.rand <- subset(quantity.rand.sa[
  which(!is.na(quantity.rand.sa$prop_sa.tds)),],
  SpkrAge == "Child")
propchitcds.rand.corr_age <- propchitcds.rand %>%
  group_by(aclew_child_id, tchiyr.std, age_mo_round) %>%
  summarise(avg_prpchitcds = mean(prop_sa.tds))
propchitcds.rand.corr_age.test <- cor.test(
  ~ age_mo_round + avg_prpchitcds,
  data = propchitcds.rand.corr_age, method = "spearman")
# turn taking
propchitcds.tt <- subset(quantity.nonrand.tt.sa[
  which(!is.na(quantity.nonrand.tt.sa$prop_sa.tds)),],
  SpkrAge == "Child")
propchitcds.tt.corr_age <- propchitcds.tt %>%
  group_by(aclew_child_id, tchiyr.std, age_mo_round) %>%
  summarise(avg_prpchitcds = mean(prop_sa.tds))
propchitcds.tt.corr_age.test <- cor.test(
  ~ age_mo_round + avg_prpchitcds,
  data = propchitcds.tt.corr_age, method = "spearman")

In contrast to findings from @shneidman2012language on Yucatec Mayan, most TCDS in the current data came from adult speakers (mean = 80.61%, median = 87.22%, range = 45.9%–100), with no evidence for an increase in proportion TCDS from children with target child age (correlation between child age and proportion TCDS from children: Spearman’s rho = -0.29; p = 0.42).

Other-directed speech (ODS)

Children heard an average of 21.05 minutes per hour in the random sample (median = 17.8; range = 3.57–42.8): that is, 5–6 times as much speech as was directed to them. We modeled ODS min/hr in the random clips with a zero-inflated negative binomial regression, as described above.

## ODS random sample ####
#ggplot(quantity.rand, aes(round(ods_mph,0))) + geom_histogram()
#sd(round(quantity.rand$ods_mph,0))^2
#mean(round(quantity.rand$ods_mph,0))
# mean is much smaller than variance
ods.rand.zinb <- glmmTMB(round(ods_mph,0) ~
                           tchiyr.std +
                           I(stthr.std^2) +
                           hsz.std +
                           nsk.std +
                           tchiyr.std:I(stthr.std^2) +
                           tchiyr.std:hsz.std +
                           tchiyr.std:nsk.std +
                           (1|aclew_child_id), #I(stthr.std^2)
                         data=quantity.rand,
                         ziformula=~tchiyr.std+I(stthr.std^2),#nsk.std
                         family="nbinom1")
#res = simulateResiduals(ods.rand.zinb)
#plot(res, rank = T)
#summary(ods.rand.zinb)
#I(stthr.std^2)             2.71796    1.14911   2.365   0.0180 *  
#nsk.std                    1.05632    0.09152  11.542   <2e-16 ***
#tchiyr.std:I(stthr.std^2)  2.19282    1.23831   1.771   0.0766 .  
#tchiyr.std:hsz.std         0.32612    0.16254   2.006   0.0448 *  
# No significant ZI effects
# save for reporting
ods.rand.zinb.COEF.time <-
  coef(summary(ods.rand.zinb))[[1]]["I(stthr.std^2)",] 
ods.rand.zinb.COEF.nsk <-
  coef(summary(ods.rand.zinb))[[1]]["nsk.std",] 
ods.rand.zinb.COEF.agehsz <-
  coef(summary(ods.rand.zinb))[[1]]["tchiyr.std:hsz.std",]

## ODS tt sample ####
#ggplot(quantity.nonrand.tt, aes(round(ods_mph,0))) + geom_histogram()
#sd(round(quantity.nonrand.tt$ods_mph,0))^2
#mean(round(quantity.nonrand.tt$ods_mph,0))
# mean is much smaller than variance
# still zero-inflated
ods.tt.zinb <- glmmTMB(round(ods_mph,0) ~
                         tchiyr.std +
                         I(stthr.std^2) +
                         hsz.std +
                         nsk.std +
                         tchiyr.std:I(stthr.std^2) +
                         tchiyr.std:hsz.std +
                         tchiyr.std:nsk.std +
                         (1|aclew_child_id), #I(stthr.std^2)
                       data=quantity.nonrand.tt,
                       ziformula=~tchiyr.std+nsk.std, #I(stthr.std^2)
                       family="nbinom1")
#res = simulateResiduals(ods.tt.zinb)
#plot(res, rank = T)
#summary(ods.tt.zinb)
#tchiyr.std                -0.49087    0.19113  -2.568   0.0102 *  
#nsk.std                    0.59379    0.10115   5.871 4.34e-09 ***
# No significant ZI effects
#tchiyr.std     -4.496      2.620  -1.716   0.0862 .
# save for reporting
ods.tt.zinb.COEF.age <-
  coef(summary(ods.tt.zinb))[[1]]["tchiyr.std",] 
ods.tt.zinb.COEF.nsk <-
  coef(summary(ods.tt.zinb))[[1]]["nsk.std",] 

The count model of ODS in the randomly selected clips revealed that the presence of more speakers was strongly associated with more ODS (B = 1.06, SD = 0.09, z = 11.54, p = 0). Additionally, more ODS occurred in the mornings and evenings (B = 2.7, SD = 1.14, z = 2.36, p = 0.02), and was also more frequent in large households for older children compared to younger children (B = 0.33, SD = 0.16, z = 2.01, p = 0.04). There were no other significant effects on ODS rate, and no significant effects in the zero-inflation models.10

Other-directed speech may have been so common because there were an average 3.44 speakers present other than the target child in the randomly selected clips (median = 3; range = 0–10), and (typically) more than half of the speakers were adults. However, these estimates may be comparable to North American infants (6–7 months) living in nuclear family homes [@bergelson2018day], so a high incidence of ODS may be common for infants in many sociocultural contexts.

Illustration of a transcript clip between the target child (TCH), an older sister (SIS), and mother (MOT) in which transitions between the target child and other interlocutors are marked in solid and dashed lines and in which interactional sequences are marked with dotted lines. Light gray boxes indicate TCDS and dark gray boxes indicate ODS.

Illustration of a transcript clip between the target child (TCH), an older sister (SIS), and mother (MOT) in which transitions between the target child and other interlocutors are marked in solid and dashed lines and in which interactional sequences are marked with dotted lines. Light gray boxes indicate TCDS and dark gray boxes indicate ODS.

Target-child-to-other turn transitions (TC–O)

We detect contingent turn exchanges between the target child and other speakers based on turn timing Figure 6. If a child’s vocalization is followed by a target-child-directed utterance within -1000–2000msec of the end of the child’s vocalization [@hilbrink2015early; @casillas2016turn], it is counted as a contingent response (i.e., a TC–O transition). We use the same idea to find other-to-target-child transitions below (i.e., a target-child-directed utterance followed by a target child vocalization with the same overlap/gap restrictions). Each target child vocalization can only have one prompt and one response and each target-child-directed utterance can maximally count once as a prompt and once as a response (e.g., in a TC–O–TC sequence, the ‘O’ is both a response and a prompt).

Gap and overlap restrictions are based on prior studies of infant and young children’s turn taking [@hilbrink2015early; @casillas2016turn], though the timing margins are increased slightly for the current dataset because the prior estimates come from relatively short, intense bouts of interaction in WEIRD parental contexts. Note, too, that much prior work has used maximum gaps of similar or greater length to detect verbal contingencies in caregiver-child interaction; and any work based on LENA^TM conversational blocks is thereby based on a 5-second silence maximum [@vanegeren2001mother; @bornstein2015mother; @broesch2016similarities; @kuchirko2017becoming; @bergelsoncasillas2018what; @romeo2018beyond]; in comparison our timing restrictions are quite stringent.

# Graph the basic turn taking rate info
# CHI-OTH transitions per minute
chi.oth.tts.rand_and_tt <- ggplot(turn.transitions.rand_and_tt,
                          aes(x = age_mo_round, y = n.c_o.tpm, lty = sample)) +
  geom_boxplot(aes(group = interaction(age_mo_round, sample),
                   color = sample), fill = "white", outlier.shape = NA,
               lty = "solid", alpha = 0.4) +
  geom_smooth(aes(fill = sample, color = sample), method = "lm") +
  ylab("CHI-OTH tts/min") + xlab("")    +
  scale_y_continuous(limits=c(0,30),
                     breaks=seq(0,30,5)) +
  scale_x_continuous(limits=c(0,38),
                     breaks=seq(0,38,6)) +
  coord_cartesian(ylim=c(0,30),xlim=c(0,38)) +
  scale_color_manual(values = viridis(3)) +
  scale_fill_manual(values = viridis(3)) +
  theme_apa() +
  theme(legend.position="none",
        axis.line = element_line(color="black", size = 0.4))

# OTH-CHI transitions per minute
oth.chi.tts.rand_and_tt <- ggplot(turn.transitions.rand_and_tt,
                          aes(x = age_mo_round, y = n.o_c.tpm, lty = sample)) +
  geom_boxplot(aes(group = interaction(age_mo_round, sample),
                   color = sample), fill = "white", outlier.shape = NA,
               lty = "solid", alpha = 0.4) +
  geom_smooth(aes(fill = sample, color = sample), method = "lm") +
  ylab("OTH-CHI tts/min") + xlab("Child age (mo)")  +
  scale_y_continuous(limits=c(0,30),
                     breaks=seq(0,30,5)) +
  scale_x_continuous(limits=c(0,38),
                     breaks=seq(0,38,6)) +
  coord_cartesian(ylim=c(0,30),xlim=c(0,38)) +
  scale_color_manual(values = viridis(3)) +
  scale_fill_manual(values = viridis(3)) +
  theme_apa() +
  theme(legend.position="none",
        axis.line = element_line(color="black", size = 0.4))

# Graph the basic sequence duration info
# plot per-clip averages so it's consistent with the rest
turn.sequences.rand_and_tt.byclip <- turn.sequences.rand_and_tt %>%
  group_by(aclew_child_id, age_mo_round, sample, segment) %>%
  summarise(m.seqdur.sec = mean(seq.dur*60))
seq.dur.rand_and_tt <- ggplot(turn.sequences.rand_and_tt.byclip,
                          aes(x = age_mo_round, y = m.seqdur.sec, lty = sample)) +
  geom_boxplot(aes(group = interaction(age_mo_round, sample),
                   color = sample), fill = "white", outlier.shape = NA,
               lty = "solid", alpha = 0.4) +
  geom_smooth(aes(fill = sample, color = sample), method = "lm") +
  ylab("Seq. dur. (sec)") + xlab("")    +
  scale_y_continuous(limits=c(0,60),
                     breaks=seq(0,60,20)) +
  scale_x_continuous(limits=c(0,38),
                     breaks=seq(0,38,6)) +
  coord_cartesian(ylim=c(0,60),xlim=c(0,38)) +
  scale_color_manual(values = viridis(3)) +
  scale_fill_manual(values = viridis(3)) +
  theme_apa() +
  theme(legend.position="none",
        axis.line = element_line(color="black", size = 0.4))
By-child estimates of contingent responses per minute to the target child's vocalizations (left), contingent responses per minute by the target child to others' target-child-directed speech (middle), and the average duration of contingent interactional sequences (right). Each datapoint represents the value for a single clip within the random (purple; solid) or turn taking (green; dashed) samples. Bands on the solid linear trends show 95% CIs.

By-child estimates of contingent responses per minute to the target child’s vocalizations (left), contingent responses per minute by the target child to others’ target-child-directed speech (middle), and the average duration of contingent interactional sequences (right). Each datapoint represents the value for a single clip within the random (purple; solid) or turn taking (green; dashed) samples. Bands on the solid linear trends show 95% CIs.

Other speakers responded contingently to the target children’s vocalizations at an average rate of 1.38 transitions per minute (median = 0.4; range = 0–8.6). We modeled TC–O transtions per minute in the random clips with a zero-inflated negative binomial regression, as described above.

## CHI-OTH tts/min random sample ####
#ggplot(turn.transitions.rand, aes(round(n.c_o.tpm,0))) + geom_histogram()
#sd(round(turn.transitions.rand$n.c_o.tpm,0))^2
#mean(round(turn.transitions.rand$n.c_o.tpm,0))
# mean isn't much smaller than variance
# zero inflated
c_o.tpm.rand.zinb <- glmmTMB(round(n.c_o.tpm,0) ~
                               tchiyr.std +
                               I(stthr.std^2) +
                               hsz.std +
                               nsk.std +
                               tchiyr.std:I(stthr.std^2) +
                               tchiyr.std:hsz.std +
                               tchiyr.std:nsk.std +
                               (1|aclew_child_id), #I(stthr.std^2)
                             data=turn.transitions.rand,
                             ziformula=~tchiyr.std+I(stthr.std^2)+nsk.std,
                             family="nbinom1")
#res = simulateResiduals(c_o.tpm.rand.zinb)
#plot(res, rank = T)
#summary(c_o.tpm.rand.zinb)
#tchiyr.std                  0.8227     0.4773   1.724   0.0848 .
#nsk.std                    -0.3252     0.1781  -1.826   0.0679 .
#tchiyr.std:I(stthr.std^2)  -6.4737     2.5737  -2.515   0.0119 *
#tchiyr.std:nsk.std          0.4724     0.2249   2.100   0.0357 *
# ZI:
#I(stthr.std^2) -27.0345    15.7215  -1.720   0.0855 .
#nsk.std         -3.1859     1.8285  -1.742   0.0815 .
# save for reporting
c_o.tpm.rand.zinb.COEF.agetime <-
  coef(summary(c_o.tpm.rand.zinb))[[1]]["tchiyr.std:I(stthr.std^2)",] 
c_o.tpm.rand.zinb.COEF.agensk <-
  coef(summary(c_o.tpm.rand.zinb))[[1]]["tchiyr.std:nsk.std",]
c_o.tpm.rand.zinb.COEF.ZItime <-
  coef(summary(c_o.tpm.rand.zinb))[[2]]["I(stthr.std^2)",]
c_o.tpm.rand.zinb.COEF.ZInsk <-
  coef(summary(c_o.tpm.rand.zinb))[[2]]["nsk.std",]

## CHI-OTH tts/min tt sample ####
#ggplot(turn.transitions.tt, aes(round(n.c_o.tpm,0))) + geom_histogram()
#sd(round(turn.transitions.tt$n.c_o.tpm,0))^2
#mean(round(turn.transitions.tt$n.c_o.tpm,0))
# mean isn't much smaller than variance
# not zero-inflated (nature of sample)
c_o.tpm.tt.nb <- glmmTMB(round(n.c_o.tpm,0) ~
                               tchiyr.std +
                               I(stthr.std^2) +
                               hsz.std +
                               nsk.std +
                               tchiyr.std:I(stthr.std^2) +
                               tchiyr.std:hsz.std +
                               tchiyr.std:nsk.std +
                               (1|aclew_child_id), #I(stthr.std^2)
                             data=turn.transitions.tt,
                             family="nbinom1")
#res = simulateResiduals(c_o.tpm.tt.nb)
#plot(res, rank = T)
#summary(c_o.tpm.tt.nb)
#nsk.std                   -0.26230    0.11332  -2.315   0.0206 *  
#tchiyr.std:I(stthr.std^2) -3.23951    1.58644  -2.042   0.0412 *

The rate at which children hear contingent response from others was primarily influenced by factors relating to the child’s age. Older children heard more contingent responses then younger children when there were more speakers present (B = 0.47, SD = 0.22, z = 2.11, p = 0.03). Also, as with the speech quantity measures, younger children heard more contingent responses in the mornings and evenings while this effect was less pronounced for older children (B = -6.46, SD = 2.56, z = -2.52, p = 0.01).There were no other significant effects on TC–O transition rate, and no significant effects in the zero-inflation model either.11

Other-to-target-child turn transitions (O–TC)

Tseltal children responded contingently to others’ target-child vocalizations at an average rate of 1.17 transitions per minute (median = 0.2; range = 0–8.8). We modeled O–TC transtions per minute in the random clips with a zero-inflated negative binomial regression, as described above.

## OTH-CHI tts/min random sample ####
#ggplot(turn.transitions.rand, aes(round(n.o_c.tpm,0))) + geom_histogram()
#sd(round(turn.transitions.rand$n.o_c.tpm,0))^2
#mean(round(turn.transitions.rand$n.o_c.tpm,0))
# mean isn't much smaller than variance
# zero-inflated
o_c.tpm.rand.zinb <- glmmTMB(round(n.o_c.tpm,0) ~
                               tchiyr.std +
                               I(stthr.std^2) +
                               hsz.std +
                               nsk.std +
                               tchiyr.std:I(stthr.std^2) +
                               tchiyr.std:hsz.std +
                               tchiyr.std:nsk.std +
                               (1|aclew_child_id), #I(stthr.std^2)
                             data=turn.transitions.rand,
                             ziformula=~tchiyr.std+I(stthr.std^2)+nsk.std,
                             family="nbinom1")
#res = simulateResiduals(o_c.tpm.rand.zinb)
#plot(res, rank = T)
#summary(o_c.tpm.rand.zinb)
#tchiyr.std                  0.9148     0.4954   1.846  0.06481 .
#tchiyr.std:I(stthr.std^2)  -7.3117     2.6194  -2.791  0.00525 **
# ZI:
#I(stthr.std^2) -23.8084    13.8124  -1.724   0.0848 .
# save for reporting
o_c.tpm.rand.zinb.COEF.agetime <-
  coef(summary(o_c.tpm.rand.zinb))[[1]]["tchiyr.std:I(stthr.std^2)",]

## OTH-CHI tts/min tt sample ####
#ggplot(turn.transitions.tt, aes(round(n.o_c.tpm,0))) + geom_histogram()
#sd(round(turn.transitions.tt$n.o_c.tpm,0))^2
#mean(round(turn.transitions.tt$n.o_c.tpm,0))
# mean is much smaller than variance
# not really zero-inflated
o_c.tpm.tt.nb <- glmmTMB(round(n.o_c.tpm,0) ~
                               tchiyr.std +
                               I(stthr.std^2) +
                               hsz.std +
                               nsk.std +
                               tchiyr.std:I(stthr.std^2) +
                               tchiyr.std:hsz.std +
                               tchiyr.std:nsk.std +
                               (1|aclew_child_id), #I(stthr.std^2)
                             data=turn.transitions.tt,
                             family="nbinom1")
#res = simulateResiduals(o_c.tpm.tt.nb)
#plot(res, rank = T)
#summary(o_c.tpm.tt.nb)
#nsk.std                   -0.26280    0.11984  -2.193   0.0283 *  
#tchiyr.std:I(stthr.std^2) -3.08043    1.69437  -1.818   0.0691 .
# save for reporting
o_c.tpm.tt.nb.COEF.nsk <-
  coef(summary(o_c.tpm.tt.nb))[[1]]["nsk.std",]

The rate at which children respond contingently to others (O–TC turn transitions per minute) was similarly influenced by child age and time of day: older children were less likely than young children to show peak response rates in the morning and evening (B = -7.3, SD = 2.61, z = -2.8, p = 0.01). There were no further significant effects in the count or zero-inflation models.12

Sequence duration

Sequences of interaction include periods of contingent turn taking with at least one target child vocalization and one target-child-directed prompt or response from another speaker. We use the same mechanism as before to detect contingent TC–O and O–TC transitions, but also allow for speakers to continue with multiple vocalizations in a row (e.g., TC–O–O–TC–OTH; Figure 7. Sequences are bounded by the earliest and latest vocalization for which there is no contingent prompt/response, respectively. Each target child vocalization can only appear in one sequence, and many sequences have more than one child vocalization. Because sequence durations were not zero-inflated, we modeled them in the random clips with negative binomial regression.

## Sequence duration random sample ####
#ggplot(turn.sequences.rand, aes(round((seq.dur*60),0))) + geom_histogram()
#sd(round((turn.sequences.rand$seq.dur*60),0))^2
#mean(round((turn.sequences.rand$seq.dur*60),0))
# mean is much smaller than variance
# non-zero values
turn.sequences.rand$uniq.segment <- paste0(turn.sequences.rand$aclew_child_id, "-",
                                          turn.sequences.rand$segment)
seqdur.sec.rand.nb <- glmmTMB(round((seq.dur*60),0) ~
                                tchiyr.std +
                                I(stthr.std^2) +
                                hsz.std +
                                nsk.std +
                                tchiyr.std:I(stthr.std^2) +
                                tchiyr.std:hsz.std +
                                tchiyr.std:nsk.std +
                                (1|uniq.segment) +
                                (1|aclew_child_id), #I(stthr.std^2)
                              data=turn.sequences.rand,
                              family="nbinom1")
#res = simulateResiduals(seqdur.sec.rand.nb)
#plot(res, rank = T)
#summary(seqdur.sec.rand.nb) # (no significant effects)
#tchiyr.std:hsz.std         0.18845    0.11010   1.712    0.087 .  
# save for reporting
seqdur.sec.rand.nb.COEF.agehsz <-
  coef(summary(seqdur.sec.rand.nb))[[1]]["tchiyr.std:hsz.std",]

## Sequence duration tt sample ####
#ggplot(turn.sequences.tt, aes(round((seq.dur*60),0))) + geom_histogram()
#sd(round((turn.sequences.tt$seq.dur*60),0))^2
#mean(round((turn.sequences.tt$seq.dur*60),0))
# mean is much smaller than variance
# non-zero values
turn.sequences.tt$uniq.segment <- paste0(turn.sequences.tt$aclew_child_id, "-",
                                          turn.sequences.tt$segment)
seqdur.sec.tt.nb <- glmmTMB(round((seq.dur*60),0) ~
                                tchiyr.std +
                                I(stthr.std^2) +
                                hsz.std +
                                nsk.std +
                                tchiyr.std:I(stthr.std^2) +
                                tchiyr.std:hsz.std +
                                tchiyr.std:nsk.std +
                                (1|uniq.segment) +
                                (1+I(stthr.std^2)|aclew_child_id),
                              data=turn.sequences.tt,
                              family="nbinom1")
#res = simulateResiduals(seqdur.sec.tt.nb)
#plot(res, rank = T)
#summary(seqdur.sec.tt.nb)
#tchiyr.std                -0.243090   0.100535  -2.418   0.0156 *  
#I(stthr.std^2)             2.773214   1.111067   2.496   0.0126 *  
#hsz.std                   -0.213865   0.095167  -2.247   0.0246 *  
#tchiyr.std:hsz.std        -0.206061   0.117386  -1.755   0.0792 . 
# save for reporting
seqdur.sec.tt.nb.COEF.age <-
  coef(summary(seqdur.sec.tt.nb))[[1]]["tchiyr.std",]
seqdur.sec.tt.nb.COEF.time <-
  coef(summary(seqdur.sec.tt.nb))[[1]]["I(stthr.std^2)",]
seqdur.sec.tt.nb.COEF.hsz <-
  coef(summary(seqdur.sec.tt.nb))[[1]]["hsz.std",]

We detected 311 interactional sequences in the 90 randomly selected clips, with an average sequence duration of 10.13 seconds (median = 7; range = 0.56–85.47). The average number of child vocalizations within these sequences was 3.75 (range = 1–29; median = 3). None of the predictors significantly impacted sequence duration (all p > 0.09).13

Peak interaction

As expected, the turn-taking clips featured a much higher rate of contingent turn transitions: the average TC–O transition rate was 7.73 transitions per minute (median = 7.8; range = 0–25) and the average O–TC rate was 7.56 transitions per minute (median = 6.2; range = 0–26). The interactional sequences were also longer on average: 12.27 seconds (median = 8.1; range = 0.55–61.22).

Crucially, children also heard much more TCDS in the turn-taking clips—13.28 min/hr (median = 13.65; range = 7.32–20.19)—while also hearing less ODS—11.93 min/hr (median = 10.18; range = 1.37–24.42).

We modeled each of these five speech environment measures with parallel models to those used above (with no zero-inflation model for TCDS, TC–O, and O–TC rates, given the nature of the sample). The impact of child age, time of day, household size, and number of speakers was qualitatively similar (basic sample comparisons are visualized in Figure 3, Figure 4, and Figure 6) between the randomly selected clips and these peak periods of interaction with the following exceptions: older children heard significantly less ODS (B = -0.49, SD = NaN, z = NaN, p = NaN), the presence of more speakers significantly decreased children’s response rate to other’s vocalizations (B = -0.26, SD = 0.12, z = -2.19, p = 0.03), and children’s interactional sequences were shorter for older children (B = -0.24, SD = 0.1, z = -2.42, p = 0.02), shorter for children in large households (B = -0.21, SD = 0.1, z = -2.25, p = 0.02), and longer during peak periods in the mornings and afternoons (B = 2.76, SD = 1.11, z = 2.5, p = 0.01). Full model outputs can be compared in the Supplementary Materials.

Peak minutes in the day

# get average overall turn transition rate
ttrate.all.avg.bychild <- turn.transitions %>%
  filter(grepl("tt", segment)) %>%
  replace_na(list(tm1.tier = 0, tp1.tier = 0)) %>%
  mutate(tm1.bin = ifelse(tm1.tier == 0, 0, 1),
         tp1.bin = ifelse(tp1.tier == 0, 0, 1)) %>%
  group_by(aclew_child_id, segment) %>%
  summarise(oct.raw = sum(tm1.bin), cot.raw = sum(tp1.bin)) %>%
  mutate(segdur = ifelse(grepl('ext', segment), 5, 1)) %>%
  group_by(aclew_child_id) %>%
  summarise (alltts = sum(oct.raw + cot.raw), sampledur = sum(segdur)) %>%
  mutate(peakttrate = alltts/sampledur)
ttrate.all.avg <- mean(ttrate.all.avg.bychild$peakttrate)

# set up a table of tt onset times within the randomly sampled segments
tts.all.rand <- turn.transitions %>%
  filter(grepl("rand", segment)) %>%
  select(aclew_child_id, segment, tm1.stop, tp1.start)
rand.seg.starts <- seg.info %>%
  filter(grepl("rand", CodeName)) %>%
  select(aclew_id, CodeName, clipoffset.sec)
rand.seg.secs <- tibble()
for (i in 1:nrow(rand.seg.starts)) {
  seg.secs <- tibble(
    aclew_child_id = rand.seg.starts$aclew_id[i],
    segment = rand.seg.starts$CodeName[i],
    segoffset.sec = c(0:299), # seconds in a (5-min) random clip
    segtime.sec = seq(rand.seg.starts$clipoffset.sec[i],
                      (rand.seg.starts$clipoffset.sec[i] + 299), 1),
    bin.tt = 0
  )
  tts.all.rand.chi <- tts.all.rand %>%
    filter(aclew_child_id == rand.seg.starts$aclew_id[i],
           segment == rand.seg.starts$CodeName[i])
  # add in tm1s
  tts.all.rand.chi.tm1 <- tts.all.rand.chi$tm1.stop[!is.na(tts.all.rand.chi$tm1.stop)]
  if (length(tts.all.rand.chi.tm1) > 0) {
    tm1.onsets.sec <- tts.all.rand.chi.tm1/1000
    # REMINDER: example use of findInterval:
    # findInterval(c(2,5,7,8), c(4,5,6,8,9,10)) => [1] 0 2 3 4
    tt.idx <- findInterval(tm1.onsets.sec, seg.secs$segtime.sec)
    seg.secs$bin.tt[tt.idx] <- seg.secs$bin.tt[tt.idx] + 1
  }
  # add in tp1s
  tts.all.rand.chi.tp1 <- tts.all.rand.chi$tp1.start[!is.na(tts.all.rand.chi$tp1.start)]
  if (length(tts.all.rand.chi.tp1) > 0) {
    tp1.onsets.sec <- tts.all.rand.chi.tp1/1000
    tt.idx <- findInterval(tp1.onsets.sec, seg.secs$segtime.sec)
    seg.secs$bin.tt[tt.idx] <- seg.secs$bin.tt[tt.idx] + 1
  }
  rand.seg.secs <- bind_rows(rand.seg.secs, seg.secs)
}

# get average tt rates in 1-min windows
rand.window.tts <- rand.seg.secs %>%
  filter(segoffset.sec >= 59) %>%
  select(-bin.tt) %>%
  mutate(ttr.min = 0)
for (chi in unique(rand.seg.secs$aclew_child_id)) {
  for (seg in unique(rand.seg.secs$segment)) {
    for (minstart in c(1:241)) {
      rand.window.tts$ttr.min[which(rand.window.tts$aclew_child_id == chi &
                              rand.window.tts$segment == seg &
                              rand.window.tts$segoffset.sec == minstart+59-1)] <-
        sum(subset(rand.seg.secs, aclew_child_id == chi &
                     segment == seg)$bin.tt[(minstart:(minstart+59))])
    }
  }
}
rand.window.tts <- rand.window.tts %>%
  left_join(select(ttrate.all.avg.bychild, c(aclew_child_id, peakttrate))) %>%
  mutate(GE.chipeak = ifelse(ttr.min >= peakttrate, 1, 0),
         GE.allpeak = ifelse(ttr.min >= ttrate.all.avg, 1, 0))
write_csv(rand.window.tts, "ttr.random.1minwindow.csv")

# calculate # seconds in "peak" tt rates and duration of peaks
ttrGEpeak.segs <- rand.window.tts %>%
  filter(GE.allpeak == 1)
ttrGEpeak.segs$prevpeak <- c(-1, ttrGEpeak.segs$segoffset.sec[1:nrow(ttrGEpeak.segs)-1])
ttrGEpeak.segs$newstreak <- ifelse(ttrGEpeak.segs$prevpeak ==
                                        (ttrGEpeak.segs$segoffset.sec - 1), 0, 1)
ttrGEpeak.segs <- select(ttrGEpeak.segs, -prevpeak)
ttrGEpeaks <- filter(ttrGEpeak.segs, newstreak == 1)
ttrGEpeaks$end.sec <- 0
for (i in 1:nrow(ttrGEpeaks)) {
  chi <- ttrGEpeaks$aclew_child_id[i]
  seg <- ttrGEpeaks$segment[i]
  start <- ttrGEpeaks$segoffset.sec[i]
  start.idx <- which(ttrGEpeak.segs$aclew_child_id == chi &
                       ttrGEpeak.segs$segment == seg &
                       ttrGEpeak.segs$segoffset.sec == start)
  newstreaks <- which(ttrGEpeak.segs$aclew_child_id == chi &
                        ttrGEpeak.segs$segment == seg &
                        ttrGEpeak.segs$segoffset.sec > start &
                        ttrGEpeak.segs$newstreak == 1)
  if (length(newstreaks) > 0) {
    end.sec <- ttrGEpeak.segs$segoffset.sec[newstreaks[1] - 1]
  } else {
    end.sec <- ttrGEpeak.segs$segoffset.sec[max(
      which(ttrGEpeak.segs$aclew_child_id == chi &
              ttrGEpeak.segs$segment == seg &
              ttrGEpeak.segs$segoffset.sec >= start))]
  }
  ttrGEpeaks$end.sec[i] <- end.sec + 1
}
ttrGEpeaks <- ttrGEpeaks %>%
  mutate(start.sec = segoffset.sec - 59,
         peak.dur = end.sec - start.sec)
ttrGEpeaks.summary <- ttrGEpeaks %>%
  group_by(aclew_child_id) %>%
  summarise(npeaks = n(),
            pkdur.mean = mean(peak.dur),
            pkdur.sum = sum(peak.dur),
            pkdur.mph = ((pkdur.sum/60)/45) * 60) %>% # (peak mins/45 poss mins) * 60 for min/hr
  # add back zero estimates for children with no tt peaks in the random data
  full_join(select(ptcp.info, aclew_child_id)) %>%
  replace_na(list(npeaks = 0, pkdur.mean = 0, pkdur.sum = 0, pkdur.mph = 0))

ttrGEpeak.segs.chi <- rand.window.tts %>%
  filter(GE.chipeak == 1)
rand.window.tts <- rand.window.tts %>%
  mutate(endoffsetsec = segoffset.sec + 1,
         delta.allpeak = ttr.min - ttrate.all.avg) %>%
  left_join(ptcp.info)
ttr.random <- ggplot(rand.window.tts,
                          aes(x = as.factor(age_mo_round), y = ttr.min)) +
  geom_jitter(color = "gray80", size = 0.3) +
  geom_violin(color = "black", fill = "black") + 
  ylab("High-tt secs/min") + xlab("Age (months)")   +
  scale_y_continuous(limits=c(-1, round(max(rand.window.tts$ttr.min) + 5, -1)),
                     breaks=seq(0, round(max(rand.window.tts$ttr.min) + 5, -1), 10)) +
  coord_cartesian(ylim=c(0, round(max(rand.window.tts$ttr.min) + 5, -1))) +
  geom_hline(yintercept = ttrate.all.avg, color = "red") +
  geom_point(color = "red", size = 2.5, aes(y = peakttrate)) +
#  scale_color_manual(values = viridis(3)) +
#  scale_fill_manual(values = viridis(3)) +
  theme_apa() +
  theme(legend.position="none",
        axis.line = element_line(color="black", size = 0.4))

We used these interactional characteristics to find similar “high turn taking” minutes in the random samples in order to extrapolate to the number of high interactivity minutes in the whole day. To do this, we scanned each 60-second window (e.g., 0–60 sec, 1–61 sec, etc.14) of each random clip from each child and recorded the observed turn-transition rate. Only 6 of the 10 children showed at least one minute of their random sample that equalled or exceeded the grand average turn-transition rate (12.89 transitions per minute), and 7 of the 10 children showed at least one minute equalling or exceeding their own average turn transition rate from their turn-taking samples, as shown in Figure 8. Across children who did show turn-taking “peaks” in their random data (i.e., at or above rates from the sample-average from the turn-taking segments), periods of “peak” interaction were relatively long, ranging in duration from an average of 0 to 103 seconds across the 6 children with such peaks.

Turn-transitions rates, estimated over the last 60 seconds for each second of the random samples by child (nine 5-min clips each). The horizontal line indicates the group mean turn-transition rate in the turn-taking sample. The large points indicate the by-child mean turn-transition rate in the turn-taking sample.

Turn-transitions rates, estimated over the last 60 seconds for each second of the random samples by child (nine 5-min clips each). The horizontal line indicates the group mean turn-transition rate in the turn-taking sample. The large points indicate the by-child mean turn-transition rate in the turn-taking sample.

Assuming approximately 12 waking hours, we therefore very roughly estimate that these Tseltal children spent an average of 100.16 minutes (1.67 hours) in high turn-taking, dyadic interaction during their recording day. However, the range in the quantity of high turn-taking interaction varies enormously across children, starting at just a few minutes per day and topping out at more than 419.7333333 minutes (7 hours) in our sample.

Discussion

Future directions

Conclusion

Acknowledgements

References

r_refs(file = "Tseltal-CLE.bib")


  1. Throughout this article, we use ‘child-directed speech’ and ‘CDS’ in the most literal sense: speech designed for and directed toward a child recipient.

  2. For a review of comparative work on language socialization in Mayan cultures, see Pye [-@pye2017comparative].

  3. Typically, the CDS-features measured in these studies correlated between short- and long-format recordings, but with some caveats.

  4. Documentation and scripts for post-processing are available at and https://github.com/marisacasillas/Weave.

  5. Full documentation, including training materials, for the ACLEW Annotation Scheme can be found at https://osf.io/b2jep/wiki/home/.

  6. The turn-taking clips included in this analysis are: the 5 one-minute turn-taking clips and also the five-minute ‘extension’ clip for that recording if it was an extension of a turn-taking clip.

  7. The data and analysis code are freely available on the web ([retracted for review]), as is a summary of the results which will be updated as more transcriptions become available ([retracted for review]).

  8. We convert the original estimates from @shneidman2010language into min/hr by using the median utterance duration in our dataset for all non-target child speakers: (1029ms). Note that, though this conversion is far from perfect, Yukatek and Tseltal are related languages.

  9. This TCDS zero-inflation did not include the number of speakers present or time of day.

  10. This ODS count model did not include by-child intercepts of time of day and its zero-inflation did not include the number of speakers present.

  11. This TC–O transition count model did not include by-child intercepts of time of day.

  12. This O–TC transition count model did not include by-child intercepts of time of day.

  13. This sequence duration model did not include by-child intercepts of time of day.

  14. 60 seconds is the smallest clip sample size in the turn-taking segments